Guida al programming CUDA: Fondamenti dello sviluppo dei kernel CUDA

Lo sviluppo dei kernel CUDA inizia con la definizione di un kernel, ovvero una funzione C++ specializzata progettata per eseguire in parallelo su un numero elevato di core di un NVIDIA GPU. Queste funzioni rappresentano l'unità fondamentale di lavoro nel modello di programmazione CUDA, agendo come ponte tra la logica seriale dell'host e l'esecuzione massivamente parallela sul dispositivo.

1. Il modificatore global

Il __global__ specificatore di dichiarazione è un qualificatore API obbligatorio che indica al compilatore di generare il codice per la GPU mantenendo visibile al CPU il punto di ingresso della funzione. Le funzioni che vengono eseguite sulla GPU e possono essere invocate dall'host si chiamano kernel.

2. Ambiente di esecuzione

I kernel vengono inviati ed eseguiti su Streaming Multiprocessors (SM). Lo SM è il principale motore computazionale all'interno di una NVIDIA GPU responsabile del gestire centinaia di thread concorrenti. Ogni SM gestisce blocchi di thread e li programma sui core di elaborazione.

Regola sintattica: I kernel devono restituire rigorosamente void. Poiché operano in modo asincrono rispetto all'host, non possono restituire un valore direttamente al CPU; devono scrivere i risultati indietro nella memoria allocata sul dispositivo.

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary function of the __global__ specifier?

It defines a function that runs on the CPU but is callable from the GPU.

It defines a kernel that runs on the GPU and is callable from the CPU.

It allocates memory on the GPU's SM cache.

It synchronizes all threads in a block.

✅ Correct!

Correct! __global__ is the bridge used to launch kernels from Host code.

❌ Incorrect

Incorrect. __global__ specifically identifies entry-point kernels for GPU execution called by the Host.

QUESTION 2

Why must CUDA kernels return void?

Because they execute asynchronously and have no direct path to return values to the Host thread.

To save registers on the SM.

Because GPU memory is read-only.

The NVCC compiler does not support float returns.

QUESTION 3

Which hardware component is responsible for managing and executing threads in a CUDA kernel?

The PCIe Controller.

The Streaming Multiprocessor (SM).

The Host RAM controller.

The BIOS.

QUESTION 4

What happens when a Host calls a kernel function?

The CPU halts until the GPU finish processing.

The GPU creates a clone of the function for every available SM.

The kernel is enqueued for execution on the GPU, and the CPU continues to the next instruction.

The CPU performs a context switch to the GPU.

QUESTION 5

Which of the following is the correct definition of a CUDA kernel?

A function that executes on the GPU and is invoked from the Host.

A C++ library for file I/O.

A hardware driver for NVIDIA GPUs.

A standard CPU function with the __gpu__ prefix.

1. Il modificatore __global__

2. Ambiente di esecuzione

1. Il modificatore global